15 research outputs found

    Software-defined datacenter network debugging

    Get PDF
    Software-defined Networking (SDN) enables flexible network management, but as networks evolve to a large number of end-points with diverse network policies, higher speed, and higher utilization, abstraction of networks by SDN makes monitoring and debugging network problems increasingly harder and challenging. While some problems impact packet processing in the data plane (e.g., congestion), some cause policy deployment failures (e.g., hardware bugs); both create inconsistency between operator intent and actual network behavior. Existing debugging tools are not sufficient to accurately detect, localize, and understand the root cause of problems observed in a large-scale networks; either they lack in-network resources (compute, memory, or/and network bandwidth) or take long time for debugging network problems. This thesis presents three debugging tools: PathDump, SwitchPointer, and Scout, and a technique for tracing packet trajectories called CherryPick. We call for a different approach to network monitoring and debugging: in contrast to implementing debugging functionality entirely in-network, we should carefully partition the debugging tasks between end-hosts and network elements. Towards this direction, we present CherryPick, PathDump, and SwitchPointer. The core of CherryPick is to cherry-pick the links that are key to representing an end-to-end path of a packet, and to embed picked linkIDs into its header on its way to destination. PathDump is an end-host based network debugger based on tracing packet trajectories, and exploits resources at the end-hosts to implement various monitoring and debugging functionalities. PathDump currently runs over a real network comprising only of commodity hardware, and yet, can support surprisingly a large class of network debugging problems with minimal in-network functionality. The key contributions of SwitchPointer is to efficiently provide network visibility to end-host based network debuggers like PathDump by using switch memory as a "directory service" — each switch, rather than storing telemetry data necessary for debugging functionalities, stores pointers to end hosts where relevant telemetry data is stored. The key design choice of thinking about memory as a directory service allows to solve performance problems that were hard or infeasible with existing designs. Finally, we present and solve a network policy fault localization problem that arises in operating policy management frameworks for a production network. We develop Scout, a fully-automated system that localizes faults in a large scale policy deployment and further pin-points the physical-level failures which are most likely cause for observed faults

    Implementing ChaCha based crypto primitives on programmable SmartNICs

    Get PDF
    Control and management plane applications such as serverless function orchestration and 4G/5G control plane functions are offloaded to smartNICs to reduce communication and processing latency. Such applications involve multiple inter-host interactions that were traditionally secured using SSL/TLS gRPC-based communication channels. Offloading the applications to smartNIC implies that we must also offload the security algorithms. Otherwise, we need to send the application messages to the host VM/container for crypto operations, negating offload benefits. We propose crypto externs for Netronome Agilio smartNICs that implement authentication and confidentiality (encryption/decryption) using the ChaCha stream cipher algorithm. AES and ChaCha are two popular cipher suites, but we chose ChaCha since none of the smartNICs have ChaCha-based crypto accelerators. However, smartNICs have restricted instruction set, and limited memory, making it difficult to implement security algorithms. This paper identifies and addresses several challenges to implement ChaCha crypto primitives successfully. Our evaluations show that our crypto extern implementation satisfies the scalability requirement of popular applications such as serverless management functions and host in-band network telemetry. © 2022 ACM

    IoT MUD enforcement in the edge cloud using programmable switch

    Get PDF
    Targeted data breaches and cybersecurity attacks involving IoT devices are becoming ever more concerning. To combat these threats and risks, the IETF standardized Manufacturer Usage Description (MUD), which allows IoT device vendors to specify the intended communication patterns (MUD profile) of an IoT device. MUD profile enables validation of the actual communication pattern of an IoT device with the intended behavior at run-time. However, the MUD specification was primarily intended for enforcement at the Local Area Network (LAN) of the IoT device, thus fragmenting the solution across multiple heterogeneous networks. MUD enforcement at higher levels in the network hierarchy (e.g., private edge for enterprise networks) eases security policy management and reduces processing overheads on the existing security infrastructure. To realize MUD enforcement at the edge, there are mainly two challenges: (1) How to identify an IoT device at the edge so that enforcing device-specific MUD profile on the IoT traffic is possible. (2) How to scale MUD enforcement to a large network of IoT devices. In this paper, we present our approach to address these challenges and validate IoT device communication at the edge. In order to scale MUD enforcement to a large IoT network, we leverage multi-stage pipeline architecture and stateful ALUs of P4 programmable switch and process IoT traffic in the dataplane. © 2022 ACM

    Special Issue on The Workshop on Performance of host-based Network Applications (PerfNA 2022)

    No full text
    With the advancement of highly network-powered paradigms like 5G, Microservices, etc. which are typically deployed as containers/VMs, there is a growing imperative on the host nodes to perform specialized network tasks like monitoring, filtering, tunneling, load-balancing, etc. While traditionally, these tasks were performed using switches and specialized middleboxes in the network, there is a demand to perform these network tasks on commodity hardware comprising of COTS servers. However, a major challenge is to perform these tasks at low-overhead and high reliability while maintaining low latency, high throughput, and flexibility

    A Case For Cross-Domain Observability to Debug Performance Issues in Microservices

    No full text
    Many applications deployed in the cloud are usually refactored into small components called microservices that are deployed as containers in a Kubernetes environment. Such applications are deployed on a cluster of physical servers which are connected via the datacenter network.In such deployments, resources such as compute, memory, and network, are shared and hence some microservices (culprits) can misbehave and consume more resources. This interference among applications hosted on the same node leads to performance issues (e.g., high latency, packet loss) in the microservices (victims) followed by a delayed or low-quality response. Given the highly distributed and transient nature of the workloads, it's extremely challenging to debug performance issues. Especially, given the nature of existing monitoring tools, which collect traces and analyze them at individual points (network, host, etc) in a disaggregated manner.In this paper, we argue toward a case for a cross-domain (network & host) monitoring and debugging framework which could provide the end-to-end observability to debug performance issues of applications and pin-point the root-cause whether it is on the sender-host, receiver-host or the network. We present the design and provide preliminary implementation details using eBPF (extended Berkeley Packet Filter) to elucidate the feasibility of the system. © 2022 IEEE

    Anomaly Detection in Data Plane Systems using Packet Execution Paths

    No full text
    Programmable data planes provide exciting opportunities to realize fast, accurate, and data-driven control-loop decisions. Many data plane systems have been proposed for handling network dynamics (e.g., congestion, failures) in near real-time. The core of these systems has packet-processing data-plane algorithms that continuously monitor traffic and respond automatically. Despite their benefits, automatic response to network events lead to increase in potential sources of inputs, and hence, increase in attack surface. This paper takes a step towards securing such systems by (1) identifying possible attacks on recently proposed data-driven data-plane systems; and (2) designing a scalable tool for detecting such attacks at run time. Our approach models plausible expected behavior and uses the model as a reference to check whether the system is under attack. We conduct preliminary experiments to demonstrate the feasibility of our detection methodology. © 2021 ACM
    corecore